Hi @jfg, your video projections look beautiful! Thank you for sharing. I'll try to answer your questions as best as I can:
1. All the ML models supported by MovementOSC provide a confidence score for each of the keypoints they detect. Increasing the minimum score will require a greater level of confidence in the keypoint to be sent as recognized over OSC. So it provides you with a way of filtering out less accurate, more noisy keypoints by increasing the minimum score to a value closer to 1.0. Confidence is expressed as a floating point number between 0 (all keypoints, no matter what), and 1 (100% confidence).
2. Yes, it's normal that the MoveNet Multi-Pose model does not provide the z axis. Only the MediaPipe BlazePose model provides depth data. As I understand it, BlazePose was trained with synthetic data from 3D-rendered images, so I don't know how accurate that data is in practice. All other models will return NaN for the Z axis.
I've designed the OSC output of MovementOSC to be stable, consistent, and monomorphic across all the models. This makes it more amenable for use in statically-typed languages and environments where having a variable number of keypoints changing on the fly when you select models will cause issues. Regardless of the model, the keypoints are always in the same order, and if a keypoint is not recognized, its x/y/z values will be returned as NaN. Some day I'd really like to create a nice mapping user interface for the software, where you can supply your own format and filter out keypoints as desired, but for now I think this is the least error-prone and most flexible approach
3. There's no way currently to get the coordinates of the box via OSC, but it's something that makes a lot of sense for me to add. I've filed an issue about this, and will look into it when I have an opportunity.
4. Up to six people can be detected by the multi-pose model. Like you, I've also noticed that this model has some challenges when bodies overlap a lot and in cases where lighting conditions are unusual or less than ideal. For examples, dancers would sometimes swap pose numbers when they crossed each other.
5. I don't know if there is a maximum distance, and it likely may depend on the model used.
5. I've not tried it with infrared light, and I'd be very curious to know how the results are. The models have, as far as I understand it, been trained on content illuminated in visible light so their accuracy may degrade on IR images, but they may not. If anyone has the ability to try it out, I'd be very interested to know what the results were.
6. For distinguishing bodies from above without doing any kind of keypoint/pose detection, I wonder if a simpler strategy like blob detection might work reasonably well? I do think there's a huge amount of potential for training or fine-tuning custom models for better results, and it's something I'm planning to explore more, but it's a big project and will take time to fund and coordinate.
Here is some more information about the models used in MovementOSC:
Google's TensorFlow.js blog about MoveNet: https://blog.tensorflow.org/20...The model card for BlazePose: https://developers.google.com/...I hope this helps,
Colin